Training IBM Watson Using Automatically Generated Question-Answer Pairs

نویسندگان

  • Jangho Lee
  • Gyuwan Kim
  • Jaeyoon Yoo
  • Changwoo Jung
  • Minseok Kim
  • Sungroh Yoon
چکیده

IBM Watson is a cognitive computing system capable of question answering in natural languages. It is believed that IBM Watson can understand large corpora and answer relevant questions more effectively than any other question-answering system currently available. To unleash the full power of Watson, however, we need to train its instance with a large number of wellprepared question-answer pairs. Obviously, manually generating such pairs in a large quantity is prohibitively time consuming and significantly limits the efficiency of Watson’s training. Recently, a large-scale dataset of over 30 million question-answer pairs was reported. Under the assumption that using such an automatically generated dataset could relieve the burden of manual question-answer generation, we tried to use this dataset to train an instance of Watson and checked the training efficiency and accuracy. According to our experiments, using this auto-generated dataset was effective for training Watson, complementing manually crafted question-answer pairs. To the best of the authors’ knowledge, this work is the first attempt to use a largescale dataset of automatically generated questionanswer pairs for training IBM Watson. We anticipate that the insights and lessons obtained from our experiments will be useful for researchers who want to expedite Watson training leveraged by automatically generated question-answer pairs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating Natural Language Question-Answer Pairs from a Knowledge Graph Using a RNN Based Question Generation Model

In recent years, knowledge graphs such as Freebase that capture facts about entities and relationships between them have been used actively for answering factoid questions. In this paper, we explore the problem of automatically generating question answer pairs from a given knowledge graph. The generated question answer (QA) pairs can be used in several downstream applications. For example, they...

متن کامل

Augmenting Conversational Characters with Generated Question-Answer Pairs

We take a conversational character trained on a set of linked question-answer pairs authored by hand, and augment its training data by adding sets of question-answer pairs which are generated automatically from texts on different topics. The augmented characters can answer questions about the new topics, at the cost of some performance loss on questions about the topics that the original charac...

متن کامل

Question Generation for Question Answering

This paper presents how to generate questions from given passages using neural networks, where large scale QA pairs are automatically crawled and processed from Community-QA website, and used as training data. The contribution of the paper is 2-fold: First, two types of question generation approaches are proposed, one is a retrieval-based method using convolution neural network (CNN), the other...

متن کامل

In the game: The interface between Watson and Jeopardy!

To play as a contestant in Jeopardy!i, IBM Watsoni needed an interface program to handle the communications between the Jeopardy! computers that operate the game and its own components: question answering, game strategy, speech, buzzer, etc. Because Watson cannot hear or see, when the categories and clues were displayed on the game board, they were also sent electronically to Watson. The progra...

متن کامل

Leveraging Video Descriptions to Learn Video Question Answering

We propose a scalable approach to learn video-based question answering (QA): to answer a free-form natural language question about the contents of a video. Our approach automatically harvests a large number of videos and descriptions freely available online. Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated. Next, we use thes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1611.03932  شماره 

صفحات  -

تاریخ انتشار 2017